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ABSTRACT 

Background and objectives: Is there a trade-off between children ever born (CEB) and post-reproductive 
lifespan in humans? Here, we report a comprehensive analysis of reproductive trade-offs in the 
Framingham Heart Study (FHS) dataset using phenotypic and genotypic correlations and a genome- 
wide association study (GWAS) to look for single-nucleotide polymorphisms (SNPs) that are related to 
the association between CEB and lifespan. 

Methodology: We calculated the phenotypic and genetic correlations of lifespan with CEB for men and 
women in the Framingham dataset, and then performed a GWAS to search for SNPs that might affect 
the relationship between post-reproductive lifespan and CEB. 

Results: We found significant negative phenotypic correlations between CEB and lifespan in both 
women (r P = —0.133, P< 0.001) and men (r P = — 0. 079, P= 0.036). The genetic correlation was large, 
highly significant and strongly negative in women (r c = —0.877, P = 0.009) in a model without covariates, 
but not in men (P=0.777). The GWAS identified five SNPs associated with the relationship between 
CEB and post-reproductive lifespan in women; some are near genes that have been linked to cancer. 
None were identified in men. 

Conclusions and implications: We identified several SNPs for which the relationship between CEB and 
post-reproductive lifespan differs by genotype in women in the FHS who were born between 1889 and 
1 958. That result was not robust to changes in the sample. Further studies on larger samples are needed 
to validate the antagonistic pleiotropy of these genes. 
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BACKGROUND AND OBJECTIVES 

Both the theory of life-history evolution and the 
evolutionary theory of aging assume a trade-off be- 
tween reproduction and survival: a cost of 
reproduction paid in lifespan [1-4]. Although well 
documented in model organisms, the existence of 
this trade-off in humans has been controversial (e.g. 
[5]). Negative [6-11], positive [12-17], U-shaped 
[18-20] and mixed or insignificant [21-27] relation- 
ships between completed family size and lifespan 
have all been found. Some results have been 
criticized on statistical grounds; some authors 
doubt that the trade-off exists at all (e.g. [28-32]). 
Two papers suggest that the cost is only expressed in 
women of low social class or nutritional status; a 
similar effect has been found in model organisms 
[5, 21, 27]. 

Although most of the attempts to measure the 
trade-off in humans are based on phenotypic correl- 
ations, the standard of evidence for the existence of a 
trade-off in evolutionary analyses of model organ- 
isms is a negative genetic correlation demonstrated 
as a correlated response to selection (e.g. [5, 33]). 
Such experiments reveal genetic relationships often 
hidden by phenotypic plasticity. This standard can- 
not be met in humans, where experimental evolution 
is not possible. 

Two other types of genetic evidence, however, are 
available in humans. First, genetic correlations can 
be measured with pedigree analysis using methods 
developed for animal breeding. Using such 
methods, Gogele et al. [34] found a significantly 
'positive' genetic correlation between completed 
family size and lifespan in a sample of more than 
5100 men and women who lived between 1658 and 
1907 in South Tyrol, Italy. 

Second, genome-wide association studies 
(CWAS) can be done on populations where both 
the relevant traits and the single-nucleotide poly- 
morphisms (SNPs) have been measured. In a 
CWAS done on more than 3500 women from 
Rotterdam, Kuningas et al. [35] found four chromo- 
somal regions that influenced completed family 
size; none of them appeared also to affect lifespan. 

The aims of this analysis of men and women in the 
Framingham Heart Study (FHS) were to add to the 
genetic information on reproductive trade-offs in 
humans by (i) first measuring the phenotypic correl- 
ation of lifespan with children ever born (CEB), (ii) 
second estimatingthegeneticcorrelation oflifespan 
with CEB and (iii) performing a GWAS to search for 



SNPs with effects on the relationship oflifespan to 
CEB. We found significantly negative phenotypic and 
genetic correlations between post-reproductive life- 
span and CEB in women. We also found five chromo- 
somal regions mediating the trade-off that were 
genome-wide significant in several statistical 
models but not when we added smoking as a 
covariate. Some of the genes in those five regions 
are associated with increased risk of cancer. 

METHODOLOGY 

The Framingham Heart Study 

Initiated in 1948 in the town of Framingham (MA), 
the FHS includes three generations of participants 
that continue to be measured. Beginning with 5209 
men and women initially enrolled in the original- 
cohort, the study added 5124 offspring-cohort par- 
ticipants in 1971 that were mostly offspring of the 
original-cohort. In 2002, a third-cohort was added 
consisting of offspring of the second cohort. 
Original-cohort participants have been examined 
every 2 years (28 exams in total to date), the off- 
spring-cohort every 4 years (eight exams in total). 
Participants are mostly of European ancestry (20% 
UK, 40% Ireland, 10% Italy and 10% Quebec). Data 
were de-identified by the FHS. Data-use and human 
subjects' approval were obtained from the National 
Institutes of Health (dbGaP) and the Yale 
Institutional Review Board. 

Phenotypic correlations 

Our sample included men and women who were 
born between the 1890s and the 1950s, except for 
age at menarche where the available sample was 
much smaller (i.e. 1923-56). Cox regression was 
used to calculate risk of death depending on age 
at first birth (n men = 2579; n women = 2193), CEB 
("men = 3833; n women = 3658), and age at menarche 
(« = 1355) and menopause (n = 2415) in women. In 
each regression, potentially confounding effects in 
lifespan were controlled by including education, 
country of origin and smoking status. To test for 
potential nonlinear effects, a separate regression 
was run with a quadratic term included for the main 
predictor traits. If quadratic terms were significant, 
this was explored further by examining the Cox 
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regression model (from the survival library in R) 
using penalized splines (with 4 df) [36, 37]. 

The Cox proportional hazards model is a standard 
tool for survival analysis, in which the log of the 
hazard function h(t) is assumed to be a linear com- 
bination of the covariates. Specifically, for a model 
containing p covariates x\, . . . ,x p , the fitted model 
takes the form of 

h{t) = h(to)exp(p lXl + ■ ■ ■ +P p x p ), 

where is the coefficient fit to covariate x t and 
h(to) is the unknown baseline hazard function. 
Equivalently, this equation can be expressed as 

ln (^) = ^ 1+ "- + ^' 

Note that FHS reports CEB as a value from '0' to 
'5', where '5' indicates having had five or more chil- 
dren. Several variables were pre-adjusted for age and 
year measured. For body mass index (BMI), systolic 
blood pressure (SBP) and total cholesterol, age and 
yeareffects were removed bytaking residuals ofeach 
trait against age (measures between 20 and 60 years 
old) and year measured using a generalized addi- 
tive model (locally weighted scatterplot smoothing, 
LOESS). All residuals for a subject were then 
averaged to obtain an average residual for each trait, 
which were then used for modelling. As 
demonstrated previously, the surface of the 
generalized additive model can be accurately 
estimated due to the large number of trait measure- 
ments [38]. 

Our initial sample included 4123 women for 
whom data on age at death, CEB, education level, 
smoking history, estrogen use and BMI were avail- 
able. We then removed 941 women who were born in 
or after 1 941 , a period when the correlation between 
lifespan and CEB was weaker, possibly because of 
the improvement of health care after World War II. 
We did so because to have a chance of detecting any 
significantly correlated SNPs in the CWAS, we 
needed to focus on a period where the phenotypic 
correlation is relatively strong. Nineteen women 
who died before the age of 50 years were also 
excluded, because their CEB records might repre- 
sent incomplete observations. Because we excluded 
women who died before the age of 50 years, we are 
specifically studying the relationship of CEB to post- 
reproductive mortality. Of the remaining 3163 
women, keeping only those who had genotype data 
reduced our sample size to 1810. We required this 



sample to have associated genotype data because 
we later used the same sample for the GWAS. Note 
that our phenotypic analysis used the year 1 91 9 as a 
cut-off because the yearly ratio of individuals alive to 
individuals deceased increased to about 50% in 
1 91 9, and continued to rise thereafter. 

For illustrative purposes, we also ran a multiple 
linear regression on a smaller sample for women, 
including only the deceased subjects who were born 
prior to 1919 (n = 680) out of a total of 1810 who 
satisfied specific criteria outlined above. 

We similarly ran a regression model on a smaller 
sample of men who have died (n = 712) out of a total 
of 1474 men satisfying similar criteria. 

Genetic correlations and heritabi I ities 

We estimated heritabilities and genetic correlations 
for traits from pedigrees using a mixed effects 
restricted maximum likelihood (REML) model in 
ASReml version 3.0 [39]. We considered models in 
which there were no covariates as well as adjusted 
models where phenotypic variation was partitioned 
into additive genetic, residual variance and a single 
random effect (maternal ID, paternal ID or educa- 
tion level). To be consistent with the phenotypic cor- 
relation models, we also considered models in 
which fixed effects (smoking status and country of 
origin) and both random effects for maternal ID and 
education level were included. Sex was not included 
as a fixed effect as male and female estimates were 
obtained separately. Smoking status (0/1, non- 
smoker/smoker) and country of origin (0/1, US 
born/foreign born) were coded as binary variables. 
Education described number of years completed, 
with missing values coded as 8 years (the min- 
imum). Maternal variance components ranged from 
0.0 (age at first birth) to 0.1 2 ± 0.04 (lifespan) and 0.0 
(age at first birth) to 0.20 ± 0.03 (lifespan) for female 
and male analyses, respectively. Education variance 
components ranged from 0.0 (age at menarche) to 
0.06 ±0.03 (CEB) and 0.0 (age at first birth) to 
0.014±0.009 (CEB) for female and male analyses, 
respectively. The Framingham pedigree totals 
15 877 individuals in 1538 pedigrees consisting of 
both immediate and extended family. Heritability 
estimates were tested for significance with likeli- 
hood ratios that compared full models with reduced 
ones (i.e. x 2 idf = 2 x (LogL FUL L- LogL REDUC ED)) 
lacking the additive genetic component. Genetic 
correlations were also tested for significance by 
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comparing likelihood values from full models to 
ones where the genetic covariance was fixed at zero. 

Our genetic correlation analysis between CEB and 
lifespan included a total of 51 33 females for whom 
age at death and CEB information were available. 
Supplementary Fig. S4 summarizes the pedigree in- 
formation for these women, grouped by cohort via 
the 'pedantics' package in R [40]. Pedigree depths 
(computed using the same package) for the 
Framingham dataset range from 0 to 4, with mean 
1.02 (±1.06). On average, each woman had 2.38 
(±1.59) children in her lifetime and lived 77.21 
(±12.73) years. The average level of education in 
years was 1 1 .66. The average age at menarche was 
12.81 (±1.54), average age at first birth was 26.49 
(±4.81) and average age at menopause was 49.20 
(±4.10). 

Genome-wide association study 

Our association results are based on 444 205 SNPs 
from the 500 K and 50 K Affymetrix samples that 
satisfied the following criteria: call rate >90%, 
Hardy-Weinberg equilibrium P-value >0. 00001, 
Mendel error rate <2% and minor allele frequency 
>0.01 . These SNP selection criteria are further dis- 
cussed in the Supplementary Information. 

We used Cox proportional hazards models, as 
done in the phenotypic correlation analysis, to esti- 
mate the interactions between survival time past age 
50 years, CEB and genotype. For censored individ- 
uals, we used theirtimes of last observation past age 
50 years as their censoring time. 

Several models were run under this setup, which 
we number to emphasize that they are nested 
models. Model 1 did not adjust for any covariates. 
We then added covariates to reduce confounding by 
variables that may be correlated with lifespan and 
CEB. Model 2 used education level. Model 3 further 
added BMI, estrogen use and cohort as covariates. 
Models 4a-d were intermediate steps in which one 
of the four additional covariates was added: blood 
pressure treatment indicator (Model 4a), total chol- 
esterol (Model 4b), SBP (Model 4c) and smoking 
indicator (Model 4d). Model 5 included all four of 
these additional covariates. Models 4a-d were run 
retrospectively to pinpoint which covariate, when 
added, resulted in removing significance from all 
SNPs. A summary of the models fitted can be found 
in the Supplementary Information. 

Both genotypes and CEB were included as con- 
tinuous variables to model an additive effect of the 



minor allele. We used both the raw genotypes 
provided by FHS as well as an imputed dataset. 
The imputation was done in several stages. First, 
we incorporated values imputed by MACH that were 
included in the FHS dataset. The MACH algorithm 
imputes missing genotypes based on shared haplo- 
type stretches between subjects and HapMap data 
[41]. Of the remaining missing values, we sampled 
among the possible genotypes given the genotypes 
of parents, when parent genotypes were available. 
Any remaining missing values were simply sampled 
according to genotype proportions of the entire 
group. This sequence of operations created a full 
set of genotypes that had no missing values. 
Cohort was defined as a categorical variable 
computed from the year of birth: born before or in 
1917 and born in or after 1918. 

In addition to running the above five models on 
the full sample of 1810, we tested our models for 
robustness by mimicking an out-of-sample analysis. 
To that end, we randomly divided our sample into 
two equal parts and fitted Models 1-5 to each part 
separately to check for consistency in significance of 
the top performing SNPs. A true out-of-sample per- 
formance check would include the calculation of pre- 
diction error based on a model fitted on a training 
set. Our method does not aim to validate prediction 
out of sample, but rather to ensure that a SNP dis- 
covered to be significant in one sample ought to be 
significant in another sample — a less stringent, but 
still important requirement of consistency. To min- 
imize the effects of missing genotypes on each sub- 
sample, which would further lower our sample size 
in each of the two separate runs, we only used the 
imputed genotypes for this portion of our analysis. 
The downside of using imputed genotypes is the risk 
of imputation error. To verify that our risk of imput- 
ation error is low, we used the imputed SNP data to 
repeat our full-sample analyses for Models 1-5. Our 
aim was to show that our results for these models 
are similar, regardless of whether we used imputed 
or raw SNP data. 

To explore possible non-additive genotypic 
effects, we ran a separate Model 6 that used geno- 
type as a categorical variable. The covariates used in 
Model 6 are identical to those used in Model 3, and 
any SNPs for which the homozygous minor geno- 
type had fewer than 20 counts were excluded. We 
did not apply the half-sample testing to Model 6, 
because in many cases, the genotype counts in the 
homozygous minor allele category were too small to 
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further subdivide the group for categorical 
modelling. 

Finally, we ran two additional models that are out- 
side of the nested framework given above on the raw 
data only (and therefore, they are not numbered). 
A quadratic model was run to search for a possible 
nonlinear effect by adding a quadratic CEB term 
along with its interaction with genotype to 
Model 1. The 'matching covariates' model was run 
to provide a frame of reference to the reader; this 
model uses exactly the same covariates that were 
included in the phenotypicand genotypic correlation 
analyses — education, smoking indicator and coun- 
try of origin. 

RESULTS 

Phenotypic correlations 

In the Cox regression analysis where as many men 
and women were included as possible (birth-year 
range 1889-1958), censoring was used to account 
for those who were still alive according to the latest 
medical records. Risk of mortality beyond age 50 
years increased if women (adjusted incidence rate 
ratio (RR) = 1.045, P = 0.030) had more children 
(Table 1). When a nonlinear term for CEB was 
included, it significantly improved the model fit 
and became more significant than the linear term. 



Penalized splines for unadjusted mortality risk 
(Fig. 1) support a predominantly U-shaped pattern 
for the association between CEB and lifespan, simi- 
lar to that found in some other studies (e.g. [19]). 
This is consistent with a cost of reproduction that is 
experienced by women with three or more children 
and with a benefit of reproduction to those who have 
one or two children. Highest mortality risk occurred 
in women with no children or more than three to four 
children, with lowest risk for those with approxi- 
mately two. Mortality risk decreased if the first child 
was born later (women, unadjusted RR = 0.971, 
P<0.001; men, adjusted RR = 0.985, P = 0.011; see 
Supplementary Fig. SI), but the significance of this 
effect depended on whether estimates were adjusted 
or not (Table 1). Mortality risk was also reduced if 
menopause occurred later in women (unadjusted 
RR = 0.970, P = 0.003), although this effect dis- 
appeared when other effects were controlled for 
(Table 1). Full model results can be seen in 
Supplementary Table SI . 

In the analysis where only the 680 women were 
included in the range of birth years 1889-1918 in 
which all had died, the phenotypic correlation be- 
tween CEB and lifespan was highly significant and 
negative (r=-0.133, P = 0.0005; Fig. 2). Linear re- 
gression indicated that every additional child cost 
0.74 years of lifespan (standard error (SE)=0.21 
years). There was, however, significant variation in 



Table 1. Incidence RR (+95% confidence interval) for age at death due to stroke, heart attack or cancer 
(beyond age 50 years) 



Trait 




Women 




Men 




Unadjusted 


Adjusted 


Unadjusted 


Adjusted 


CEB 


1 .050* 

(1 .011-1 .092) NL ** 
n = 3729 


1 .045* 

(1 .005-1 .087) NL *** 


0.995 

(0.960-1.033) 
n = 3888 


1.031 

(0.993-1.071) 


Age first birth 


0.971*** 

(0.955-0.988) NL ** 
n = 2236 


0.977* 

(0.960-0.994) NL * 


0.990 

(0.979-1.001) 
n = 2613 


0.985** 
(0.974-0.995) 


Menarche 


0.891 

(0.757-1.050) 
n = 1367 


0.917 

(0.782-1.077) 






Menopause 


0.970** 
(0.951-0.990) 


0.984 

(0.965-1.005) 







d = 2461 



Unadjusted Cox regression estimates included only the main predictor trait. Cultural effects (smoking, education and country-of-origin) were accounted 
for in adjusted estimates. 'NL' indicates that a significant nonlinear effect was also detected for the association between this trait and longevity. 
*P<0.05, **P<0.01, ***P < 0.001 . 
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Figure 1. Summary of CEB and mortality risk in Framingham 
women. A histogram of CEB and log-relative mortality risk 
values for each CEB value with 95% confidence bands 
(n = 5133) 



Lifespan vs. CEB (n=680) 
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Figure 2. Relationship between CEB and lifespan for women. 
Scatterplot illustrating correlation between CEB and lifespan 
(r=-0.133, P<0.001) (n = 680). Both variables have been 
jittered to minimize overlap of points 



the phenotypic correlation by birth year (Fig. 3); it 
was positive (with one exception) from 1 893 to 1 907 
and negative from 1 908 to 1 91 3. Many in the earlier 
group were giving birth before the Great Depression 
and World War II. Some of the latter group encoun- 
tered those two major environmental perturbations. 
The correlation between CEB and lifespan forthe 71 2 
men was slightly negative (r=— 0.079, P = 0.0355; 
Supplementary Fig. S2). An additional child cost 
0.54 years of male lifespan (SE = 0.26 years). 
Again, the correlation varied by birth year, but the 
variations were less pronounced than for females 
(Supplementary Fig. S3). The observation that 
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Figure 3. Correlation between CEB and lifespan by birth year 
for women. Women (n = 680) were grouped by overlapping 
10-year intervals of birth year, and the correlation between 
CEB and lifespan was computed for each group. Individual 
points indicate the sample size of each 10-year group, with 
the mean birth year plotted on the x-axis and correlation 
plotted on the y-axis 

phenotypic correlations are dependent on birth 
year is consistent with previous findings that 
selection pressures changed over time in 
Framingham [38]. 

Heritabilities and genetic correlations 

In women (Table 2), the heritabilities of most major 
life-history traits differed significantly from zero, 
including age at death (fr 2 = 0.12, P = 0.01), CEB 
(r? 2 = 0.09, P = 0.03), age at first birth (r7 2 = 0.18, 
P<0.001) and menopause (ri 2 = 0.44, P< 0.001). 

In women, the genetic correlation of CEB with age 
at death was large, negative and significant 
(r G = — 0.88, P=0.01) in a model without covariates 
(Supplementary Table S2). When we included edu- 
cation as a random effect, the genetic correlation 
decreased to —0.70 but was still significant 
(P=0.02). When we included either the mother or 
the father identifiers in place of education as a ran- 
dom effect, the genetic covariance remained large 
and negative, but was no longer significant (mother: 
r c = -1 .58, P = 0.11 ; father: r G = -1 .46, P = 0.1 5). The 
model in which we adjusted for education, smoking 
status and country of origin also produced a large 
negative genetic correlation, but the correlation was 
not significant (r c = -0.69, P=0.14). 

The correlation between the quadratic term CEB 2 
and lifespan was large, negative and significant in 
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Table 2. Heritabilities (h 2 , on the diagonal) and genetic correlations (re, off the diagonal) of life history 
traits (+SE) 



Age at death 



CEB 



Age first birth 



Menarche 



Menopause 



Women 
Age at death 



CEB 



Age first birth 



Menarche 



Menopause 



Men 

Age at death 



CEB 



Age first birth 



0.12 ±0.08 
P= 0.01 76 
n = 3010 



<0.01 ± <0.01 
P=0.8875 
n = 2963 



-0.69 + 0.52 
P=0.1420 

0.09 ±0.05 
P= 0.0394 
n = 4123 



<0.01 ±<0.01 
P=0.7773 

<0.01 ±<0.01 
P=0.5485 
n = 4051 



0.20 ±0.25 
P= 0.2083 

-0.40 ±0.35 
P=0.1545 

0.1 8 ±0.06 
P= 0.0008 
n = 2912 



<0.01 ±<0.01 
P=0.6101 

<0.01 ± <0.01 
P=0.3884 

0.12 ±0.07 
P= 0.0300 
n = 2688 



0.07 ±0.23 
P= 0.3886 

0.31 ±0.24 
P< 0.0001 

-0.38±0.33 
P= 0.0911 

0.16±0.13 
P= 0.0948 
n = 1638 



0.1 5 ±0.1 7 
P = 0.1917 

-0.21 ±0.21 
P = 0.1377 

-0.06 ±0.1 4 
P = 0.3541 

0.10 + 0.21 
P = 0.3121 

0.44 ±0.06 
P< 0.0001 

n = 3400 



SEs and P-values were obtained from maximum-likelihood estimates. Cultural (smoking, education and country-of-origin) and maternal effects were 
accounted for in all estimates. P-values < 0.05 are in bold. 



three of four models (no covariates: r G = — 1.09, 
P = 0.003, only mother identifier as random effect: 
r c = — 1.73, P = 0.04, only education as random ef- 
fect: r c = — 0.85, P = 0.01), and borderline non-sig- 
nificant in the model with only the father identifier 
(r c 1 .61 , P=0.06). 

Furthermore, we looked to see if the genetic cor- 
relation between CEB and lifespan was robust to 
pedigree depth in the simplest model where no 
covariates were included. Including only those 
women with pedigree depth of 1 or higher [n = 2540), 
we gotr G = — 0.46 (P = 0.14) and includingonlythose 
women with pedigree depth of2 or higher (n = 948), 
we got r G = — 0.21 (P=0.60); both correlations were 
no longer significant in the reduced samples. 

The genetic correlation of CEB with age at menar- 
che was relatively large, positive and highly signifi- 
cant (r c = 0.31, P<0.001). In men (Table 2), the 



heritability of age at first birth (inferred from their 
spouses) was small and only just significant 
(/i 2 = 0.1 2, P=0.03). All other male heritability and 
genetic correlation estimates were non-significant. 
Full model results for heritability can be seen in 
Supplementary Table S2. 

Genome-wide association study 

GWAS results are summarized in Tables 3-10; the 
birth years for the 1810 women included in 
the GWAS are shown in the Supplementary 
Information. We deemed a SNPto be genome-wide 
significant if its interaction coefficient with CEB had 
a P-value that was less than a Bonferroni-adjusted 
threshold of 1.1 3 x 10~ 7 (a = 0.05), unl ess otherwise 
indicated. For females, we found two SNPs that at- 
tained genome-wide significance using the full 
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^Tr Table 3. GWAS for SNPs that affect the relationship between CEB and lifespan: summary of significant 
SNPs in Models 1-3 and 5 (full sample) 



Ssid 


Rsid 


Chr 


Position 


Near 








P-values (genotype x CEB) 


















Model 


1 


Model 2 


Model 3 Model 4 


Model 


5 


Matching 
covariates 


ss66450977 
ss66475987 


rs6768456 
rs2575533 


3 
4 


27867272 
42432336 


EOMES 
ATP8A1 


4.03 E- 
8.02E- 


-10 a 
08 a 


4.38E-10 3 
5.30E-08 3 


8.40E-09 3 

3.06E-06 ( seeTable4 ) 


7.99E- 
2.49E- 


07 
05 


4.93E-08 a 
2.1 1 E— 07 



(7 = 1810 women. The chromosome (Chr) and position information provided below correspond to the GRCh37.p5 genome assembly, genome build 37.3. 
a SNP attained genome-wide significance. 



P^P Table 4. GWAS for SNPs that affect the relationship between CEB and lifespan: summary of significant 
SNPs in Models 4 (full sample) 



Ssid 


Rsid 


Chr 


Position 


Near 




P-values (g 


enotype x CEB) 














Model 4a 


Model 4b 


Model 4c 


Model 4d 


ss66450977 


rs6768456 


3 


27867272 


EOMES 


1.40E-09 a 


7.44E-09 3 


8.65E-09 3 


4.02E-07 


ss66475987 


rs2575533 


4 


42432336 


ATP8A1 


1.02E-05 


3.56E-06 


5.23E-06 


1.35E-05 



n = 1810 women. The chromosome (Chr) and position information provided below correspond to the GRCh37.p5 genome assembly, genome build 37.3. 
a SNP attained genome-wide significance. 



Table 5. GWAS for SNPs that affect the relationship between CEB and lifespan: summary of nominally 
significant SNPs in Model 6 



Ssid 


Rsid 


Chr 


Position 


Near 


P-value 


P-value 


Homozygous minor 












Aa x CEB 


aa x CEB 


genotype count 


ss66450977 


rs6768456 


3 


27867272 


EOMES 


1.00E-07 


2.40E-03 


21 


ss66500131 


rsl 777023 


9 


92008266 


OR7E31P 


1.00E-01 


3.00E-07 


26 


ss66392234 


rs71 32724 


12 


65001044 


HELB 


1.30E-01 


9.60E-08 


102 


ss66495977 


rs21 80957 


14 


68238574 


RAD51B 


1.20E-01 


8.70E-07 


21 



n = 1810 women. The chromosome (Chr) and position information provided below correspond to the GRCh37.p5 genome assembly, genome build 37.3. 



Table 6. GWAS for SNPs that affect the relationship between CEB and lifespan: re-evaluating significant 
SNPs in Models 1-3 and 5 (split samples) 



Ssid 




Sample 


half 1 








Sampl 


e half 2 












P-values (genotype x CEB) 








P-values (ge 


notype x 


CEB) 








Model 1 


Model 2 


Model 3 


Model 5 


Model 


1 


Model 2 


Model 3 


Model 


5 


ss66450977 


0.00032 


0.00041 


0.00097 


0.007 


9.39E- 


-08 a 


7.04E-08 a 


1.36E 


-06 


4.58E- 


-06 


ss66475987 


0.0002 


0.00012 


0.0021 


0.001 


5.46E- 


-04 


4.46E-04 


1.56E 


-03 


1.39E- 


-02 



(1 = 1810 women. The chromosome (Chr) and position information provided below correspond to the GRCh37.p5 genome assembly, genome build 37.3. 
a SNP attained genome-wide significance. 
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^9 Table 7. GWAS for SNPs that 


affect the relat 


ionship between CEB and lifespan: re-evaluating significant 


SNPs in Models 4a-d (split samples) 










Ssid Sampl 


e half 1 




Sampl 


e nait z 


P-values (genotype x CEB) 




P-values (ge 


notype x CEB) 


Model 4a Model 4b 


Model 4c 


Model 4d Model 4a 


Model 4b 


Model 4c Model 4d 


ss66450977 8.40E-04 1.30E-03 


8.00E-04 


7.40E-03 3.33E-06 


1.19E-06 


1.32E-06 3.35E-06 


ss66475987 3.00E-03 9.40E-04 


1.80E-03 


3.70E-03 2.00E-03 


2.30E-03 


3.80E-03 3.40E-03 


n = 1810 women. The chromosome (Chr) and position information provided below correspond to the GRCh37.p5 genor 


ne assembly, genome build 37.3. 


9% Table 8. GWAS for SNPs that 


affect the relat 


ionship between CEB and lifespan: top SNPs in Model 5 


(split sample) 










Ssid Rsid 


Chr 


Position 


P-values (genotype x CEB) 








Sample 1 


Sample 2 


ss66092635 rs6581676 


12 


64992353 


9.12E-06 


4.58E-01 


ss66508254 rs2961258 


7 


15150223 


1.41 E— 05 


7.86E-01 


ss66392234 rs71 32724 


12 


65001044 


1.82E-05 


4.86E-01 


ss66328248 rsl 3248967 


8 


114920075 


2.81 E-05 


6.86E-01 


ss66531142 rsll219832 


11 


124272500 


3.65E-05 


1.79E-01 


ss74823403 rs7860830 


9 


26882137 


3.27E-01 


7.19E— 10 a 


ss66231005 rsl 0899741 


7 


52215028 


4.62E-01 


9.84E-08 3 


ss66273879 rsl 728810 


3 


10992443 


4.15E-01 


1.07E-07 3 


ss66526690 rsl 6021 60 


6 


94277193 


9.00E-01 


1.57E-07 


ss66490007 rsl 1 009744 


10 


34675601 


9.86E-01 


2.37E-07 


n = 1810 women. The chromosome (Chr) and position information 


Drovided below correspond to the 


GRCh37.p5 genot 


ne assembly, genome build 37.3. 


a SNP attained genome-wide significance. 











Table 9. GWAS for SNPs that affect the relationship between CEB and lifespan: summary of significant 
SNPs in Models 1-3 and 5 (full sample) (imputed SNPs) 



Ssid 


Rsid 


Chr 


Position 


Near 






P-values (genotype x CEB) 














Model 


1 


Model 2 Model 3 Model 4 


Model 5 


ss66450977 
ss66475987 


rs6768456 
rs2575533 


3 
4 


27867272 
42432336 


EOMES 
ATP8A1 


2.91 E- 
1.50E- 


10 a 
■07 


2.20E-10 3 6.44E-09 3 

6.57E-08 3 5.03E-06 (see Table 10) 


5.56E-07 
2.94E-05 



n = 1810 women. The chromosome (Chr) and position information provided below correspond to the GRCh37.p5 genome assembly, genome build 37.3. 
a SNP attained genome-wide significance. 



sample: ss66450977 on Chromosome 3 (close to 
EOMES) and ss66475987 on Chromosome 4 (close 
to ATP8A1 ) . Their levels of significance decreased as 
additional covariates were included in the model; 



however, these SNPs were also significant in the 
matching covariates model (Tables 3 and 4). We 
also found two nominally significant SNPs that 
exhibited possibly non-additive effects: ss66392234 
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^Tr Table 10. GWAS for SNPs that affect the relationship between CEB and lifespan: summary of 
significant SNPs in Model 4 (full sample) (imputed SNPs) 



Ssid 


Rsid 


Chr 


Position 


Near 




P-values (g 


enotype x CEB) 














Model 4a 


Model 4b 


Model 4c 


Model 4d 


ss66450977 


rs6768456 


3 


27867272 


EOMES 


1.40E-08 3 


6.30E-09 3 


4.30E-09 3 


3.87E-07 


ss66475987 


rs2575533 


4 


42432336 


ATP8A1 


"I.02E-05 


5.80E-06 


5.40E-06 


2.30E-05 



f? = 1810 women. The chromosome (Chr) and position information provided below correspond to the GRCh37.p5 genome assembly, genome build 37.3. 
a SNP attained genome-wide significance. 



on Chromosome 12 (in HELB) and ss66500131 
on Chromosome 9 (close to the pseudogene 
OR7E31P) (Table 5). Nearby genes/pseudogenes 
were determined based on a radius of 150 kb from 
each SNP. 

In the split-sample analysis using imputed SNP 
data (see 'Methodology' section regarding details 
on imputation), no SNPs were found to be signifi- 
cant for females (Tables 6-8), even when the ran- 
domization used in the split-sample assignment 
was replicated 100 times. We verified that using 
the imputed data for the full-sample analysis would 
have yielded comparable levels of significance for 
the two SNPs previously discovered in Models 1-5 
(Tables 9 and 10). 

No significant SNPs were detected for males in 
Models 1-3. As in the GWAS for females, the add- 
ition of more covariates decreased levels of signifi- 
cance, and therefore no further models were run. 

No significant SNPs were detected in a model that 
included a quadratic effect of CEB. Further details on 
the GWAS for females are in the Supplementary 
Information. 

CONCLUSIONS AND IMPLICATIONS 

Phenotypic and genetic correlations 

The phenotypic correlation between CEB and 
lifespan in women differed with birth year, 
demonstrating the importance of phenotypic plasti- 
city on the relationships among life-history traits. 
Secular cultural and environmental changes affect 
that correlation and probably account for much of 
the variation among studies [6, 15, 19, 21, 22]. The 
estimate of a negative genetic correlation in women 
when not accounting for covariates (r G = — 0.88) was 
large. The effects of shared environment reduced the 
strength of the linear correlation and increased the 
strength of the quadratic correlation, and education 



mimicked the effects of a cost of reproduction in that 
increased level of education was associated with 
both fewer children and longer life: including educa- 
tion decreased the estimate of the genetic 
correlation. 

Some of our genetic correlation estimates were 
below —1 . This indicates thatthe estimated variance 
component is negative, known to be a possible re- 
sult of REML estimation [42]. 

When we controlled for the effects of smoking, 
education, country of origin and maternal effects, 
the correlation was still negative (r c = — 0.69) yet 
no longer significant. This mirrors the pattern we 
observed in the GWAS; as covariates were 
introduced into the model, associations became 
insignificant. 

The mean pedigree depth of 1.02 implies that our 
pedigree is dominated by parent-offspring relation- 
ships. This may result in some difficulty distinguishing 
parental, environmental and additive genetic effects. 
For example, cultural and lifestyle habits that are 
unique to nuclear families (such as diet) are known 
to affect lifespan, but these habits are not recorded, 
and therefore the genetic correlations that we see may 
be confounded by these unobservable factors. 

One can only find a genetic correlation when the 
phenotypic correlation is significant, and one can 
only find significant effects of SNPs on a phenotypic 
correlation when it differs from zero. Our chain of 
inference thus depends on genetic effects not being 
too masked by phenotypic plasticity. 

Gene functions 

We found several SNPs with nominally significant 
effects on the correlation of CEB with post-repro- 
ductive lifespan; two of them are near EOMES and 
RAD51B, genes that are related to cancer when 
under-expressed. The effect of the SNP close to 
EOMES reached genome-wide significance. The 
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EOMES gene has been associated with multiple 
sclerosis and bladder cancer [43, 44]. RAD51B, a 
gene involved in encoding proteins that participate 
in DNA repair, has been linked to breast cancer and 
brain cancer [45-48]. Further details on the genes in 
proximity to the SNPs found significant in our GWAS 
are included in the Supplementary Information. 
Although these SNPs were close in physical distance 
to their respective genes (<1 30 kb), further study of 
linkage disequilibrium would help to understand 
their possible association. 

Other studies 

Voorhuis et al. [49] collated the results of many gen- 
etic studies of age at natural menopause. None of 
the SNPs that we discovered were found in the 
studies included in their summary. 

Several other recent genetic studies relate fertility 
to genotype. Kosova et al. [50] found 41 SNPs 
(P<10~ 4 ) that were associated with decreased male 
fertility. Adachi et al. [51] found 36 SNPs (P < 10~ 4 ) 
with possible links to endometriosis in Japanese fe- 
males. Both were GWAS studies that did not find any 
genome-wide significant SNPs. Murray et al. [52] re- 
ported confirmations for four SNPs previously 
identified as associated with age at menopause. 
Ewens etal. [53] examined 1 5 SNPs linked with obes- 
ity to evaluate possible associations with polycystic 
ovary syndrome, the cause of a form of infertility in 
women; only one SNP had a nominal level of signifi- 
cance, and the significance did not hold up in an- 
other case-control study. Our methods differ 
fundamentally from these four studies in that we 
considered lifespan in conjunction with fertility, 
and the significant SNPs we found were not reported 
in their analyses [50-53]. 

Although the Kuningas Rotterdam study 
incorporated mortality in its analysis and was there- 
fore more similarto our study [35], it differs from our 
approach in three ways: (i) our analysis included 
many more SNPs (444 205 versus their 1664), (ii) 
we adjusted for the effects of several direct mortal- 
ity-affecting covariates such as smoking and SBP, 
(iii) Kuningas used an initial screening of the 1664 
SNPs with a set-based test (with a threshold of 
P<0.05), whereas we started with a GWAS across 
444 205 SNPs in models that relate each SNP to 
both CEB and lifespan (with a threshold of 
P < 1.13 x 10~ 7 ). We did not find Bonferroni-level 
significance with SNPs near the four gene regions 
identified in [35]. 



Summary 

We have analysed phenotypic and genetic correl- 
ations between reproductive success and survival 
and have identified a small set of genes that may 
mediate a trade-off between them. This warrants fur- 
ther studies in other samples. 

The Framingham dataset has some shortcom- 
ings. In particular, women born before the start 
of the study would only have been included 
in the study if they survived until 1948-52 
(when the study began). Therefore, our dataset 
does not include anyone who died during 
World War I, the 1918 flu pandemic, the 
Great Depression and World War II. If these cata- 
strophic events affected women differently depend- 
ing on their fertility and lifespan, then excluding 
these women from our analysis would bias our 
results. The issue is inherent in such observational 
studies of humans, and unfortunately cannot be 
avoided. 

We failed to find any significant SNPs when 
covariates (i.e. smoking, country of origin and 
average cholesterol levels) were included and 
when we did a rough check for consistency out 
of sample. It is unknown how often such checks 
modify significance of SNP associations, for many 
other published GWAS studies do not account for 
the effects of covariates or do out-of-sample 
predictions. 
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